SpamCooling: A Parallel Heterogeneous Ensemble Spam Filtering System Based on Active Learning Techniques

نویسندگان

  • Jinlong Wang
  • Ke Gao
  • Huy Quan Vu
چکیده

Anti-spam technology is developing rapidly in recent years. With the emerging applications of machine learning in diverse fields, researchers as well as manufacturers around the world have attempted a large number of related algorithms to prevent spam. In this paper, we designed an effective anti-spam protection system, SpamCooling, based on the mechanism of active learning and parallel heterogeneous ensemble learning techniques. The system adopts a batch method to filter spam and can be easily incorporated with existing mail clients (MUA). It can actively obtain user feedbacks for providing users with personalized spam filtering experiences. The parallel heterogeneous ensemble method can help system achieve high spam detection rate as well as low ham misclassification rate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

A Comparison of Ensemble and Case-Base Maintenance Techniques for Handling Concept Drift in Spam Filtering

The problem of concept drift has recently received considerable attention in machine learning research. One important practical problem where concept drift needs to be addressed is spam filtering. The literature on concept drift shows that among the most promising approaches are ensembles and a variety of techniques for ensemble construction has been proposed. In this paper we consider an alter...

متن کامل

Evaluating ensemble classifiers for spam filtering

In this study, the ensemble classifier presented by Caruana, Niculescu-Mizil, Crew & Ksikes (2004) is investigated. Their ensemble approach generates thousands of models using a variety of machine learning algorithms and uses a forward stepwise selection to build robust ensembles that can be optimised to an arbitrary metric. On average, the resulting ensemble out-performs the best individual ma...

متن کامل

Single-Class Learning for Spam Filtering: An Ensemble Approach

Spam, also known as Unsolicited Commercial Email (UCE), has been an increasingly annoying problem to individuals and organizations. Most of prior research formulated spam filtering as a classical text categorization task, in which training examples must include both spam emails (positive examples) and legitimate mails (negatives). However, in many spam filtering scenarios, obtaining legitimate ...

متن کامل

A Machine Learning Approach to Server-side

Spam-detection systems based on traditional methods have several obvious disadvantages like low detection rate, necessity of regular knowledge bases’ updates, impersonal filtering rules. New intelligent methods for spam detection, which use statistical and machine learning algorithms, solve these problems successfully. But these methods are not widespread in spam filtering for enterprise-level ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JCIT

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2010